Speech recognition in noise by using word graph combinations
نویسندگان
چکیده
In the practice, the performance of speech recognition systems is affected by speech signals being corrupted with various background noises in the environment. In this paper, we propose a new word graph combination (WGC) approach for speech-in-noise recognition. The aim of this work is to develop a method that would ensure robust speech recognition under various noise conditions, and in particular, under the adverse effect of environmental and impulsive noise. For this purpose, we developed a word graph combination (WGC) technique in which both continuous-mixture hidden Markov models (CMHMMs) and discrete-mixture hidden Markov models (DMHMMs) are being used as acoustic models. It has been previously verified that a DMHMM-based system can ensure significant improvements in the speech recognition performance under impulsive noise conditions. We also showed that the CMHMM-based system indicated better performance in high SNR conditions and environmental noise conditions. On the grounds of the above mentioned findings, we adopted a system combination approach in which both a DMHMM and a CMHMM are used. With the proposed method, complementary effects can be anticipated because the CMHMM and the DMHMM exhibit different error trends. Among the existing combination methods, which include recognizer output voting for error reduction (ROVER) and confusion network combination (CNC), in our work, we selected the technique of WGC. Unlike conventional combination approaches, like ROVER and CNC, the timing information for all word hypotheses is well preserved in the WGC. In the speech recognition experiments we performed, the proposed system showed better performance than the ROVER-based system or the baseline system. In particular, this new system showed comparatively higher performance under mixed noise conditions.
منابع مشابه
Effects of ageing on speed and temporal resolution of speech stimuli in older adults
Background: According to previous studies, most of the speech recognition disorders in older adults are the results of deficits in audibility and auditory temporal resolution. In this paper, the effect of ageing on timecompressed speech and auditory temporal resolution by word recognition in continuous and interrupted noise was studied. Methods: A time-compressed speech test (TCST) w...
متن کاملEnvelope-based inter-aural time difference localization training to improve speech-in-noise perception in the elderly
Background: Many elderly individuals complain of difficulty in understanding speech in noise despite having normal hearing thresholds. According to previous studies, auditory training leads to improvement in speech-in-noise perception, but these studies did not consider the etiology, so their results cannot be generalized. The present study aimed at investigating the effectiveness of envelope-b...
متن کاملEffect of signal to noise ratio on the speech perception ability of older adults
Background: Speech perception ability depends on auditory and extra-auditory elements. The signal-to-noise ratio (SNR) is an extra-auditory element that has an effect on the ability to normally follow speech and maintain a conversation. Speech in noise perception difficulty is a common complaint of the elderly. In this study, the importance of SNR magnitude as an extra-auditory effect on speech...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملسایکوآکوستیک و درک گفتار در افراد مبتلا به نوروپاتی شنوایی و افراد طبیعی
Background: The main result of hearing impairment is reduction of speech perception. Patient with auditory neuropathy can hear but they can not understand. Their difficulties have been traced to timing related deficits, revealing the importance of the neural encoding of timing cues for understanding speech. Objective: In the present study psychoacoustic perception (minimal noticeable differen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010